{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c0de1e6a-61f7-4f99-a2fd-1461902ab36a",
   "metadata": {},
   "source": [
    "# Sync API\n",
    "\n",
    "Reproducibility is critical for AI. For code, it's easy to keep track of changes using Github or Gitlab.\n",
    "For data, it's not as easy. Most of the time, we're manually writing complicated data tracking code, wrestling with an external tool, and dealing with expensive duplicate snapshot copies with low granularity.\n",
    "\n",
    "While working with most other vector databases, if we loaded in the wrong data (or any other such mistakes), we have to blow away the index, correct the mistake, and then completely rebuild it. It's **really difficult** to rollback to an earlier state, and any such corrective action **destroys historical data and evidence**, which may be useful down the line to debug and diagnose issues.\n",
    "\n",
    "To our knowledge, LanceDB is the first and only vector database that supports full reproducibility and rollbacks natively.\n",
    "Taking advantage of the Lance columnar data format, LanceDB supports:\n",
    "- Automatic versioning\n",
    "- Instant rollback\n",
    "- Appends, updates, deletions\n",
    "- Schema evolution\n",
    "\n",
    "This makes auditing, tracking, and reproducibility a breeze!\n",
    "\n",
    "Let's see how this all works."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cafebbce-d324-485d-90ec-503695875f47",
   "metadata": {},
   "source": [
    "## Pickle Rick!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14605311",
   "metadata": {},
   "source": [
    "Let's first prepare the data. We will be using a CSV file with a bunch of quotes from Rick and Morty"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c02976c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-12-17 11:54:43--  http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv\n",
      "Resolving vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)... 52.92.138.34, 3.5.82.160, 52.218.236.161, ...\n",
      "Connecting to vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)|52.92.138.34|:80... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 8236 (8.0K) [text/csv]\n",
      "Saving to: ‘rick_and_morty_quotes.csv.1’\n",
      "\n",
      "rick_and_morty_quot 100%[===================>]   8.04K  --.-KB/s    in 0s      \n",
      "\n",
      "2024-12-17 11:54:43 (77.8 MB/s) - ‘rick_and_morty_quotes.csv.1’ saved [8236/8236]\n",
      "\n",
      "id,author,quote\n",
      "1,Rick,\" Morty, you got to come on. You got to come with me.\"\n",
      "2,Morty,\" Rick, what’s going on?\"\n",
      "3,Rick,\" I got a surprise for you, Morty.\"\n",
      "4,Morty,\" It’s the middle of the night. What are you talking about?\"\n",
      "5,Rick,\" I got a surprise for you.\"\n",
      "6,Morty,\" Ow! Ow! You’re tugging me too hard.\"\n",
      "7,Rick,\" I got a surprise for you, Morty.\"\n",
      "8,Rick,\" What do you think of this flying vehicle, Morty? I built it out of stuff I found in the garage.\"\n",
      "9,Morty,\" Yeah, Rick, it’s great. Is this the surprise?\"\n"
     ]
    }
   ],
   "source": [
    "!wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv\n",
    "!head rick_and_morty_quotes.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "533c5f58",
   "metadata": {},
   "source": [
    "Let's load this into a pandas dataframe.\n",
    "\n",
    "It's got 3 columns, a quote id, the quote string, and the first name of the author of the quote:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ee1443e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>quote</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Rick</td>\n",
       "      <td>Morty, you got to come on. You got to come wi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Rick, what’s going on?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Rick</td>\n",
       "      <td>I got a surprise for you, Morty.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Morty</td>\n",
       "      <td>It’s the middle of the night. What are you ta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Rick</td>\n",
       "      <td>I got a surprise for you.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id author                                              quote\n",
       "0   1   Rick   Morty, you got to come on. You got to come wi...\n",
       "1   2  Morty                             Rick, what’s going on?\n",
       "2   3   Rick                   I got a surprise for you, Morty.\n",
       "3   4  Morty   It’s the middle of the night. What are you ta...\n",
       "4   5   Rick                          I got a surprise for you."
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv(\"rick_and_morty_quotes.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e74818f-109e-4b09-b5f8-dd1875c512e3",
   "metadata": {},
   "source": [
    "We'll start with a local LanceDB connection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "fa27ab30",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install lancedb -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f57d988-56b9-4384-8a7b-000d5f91034a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import lancedb\n",
    "db = lancedb.connect(\"~/.lancedb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ba9ffac-c779-49e3-91a7-f1c00f3fda41",
   "metadata": {},
   "source": [
    "Creating a LanceDB table from a pandas dataframe is straightforward using `create_table`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bd981f6d-b921-4b1d-b63a-6c1d59f3a51d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>quote</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Rick</td>\n",
       "      <td>Morty, you got to come on. You got to come wi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Rick, what’s going on?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Rick</td>\n",
       "      <td>I got a surprise for you, Morty.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Morty</td>\n",
       "      <td>It’s the middle of the night. What are you ta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Rick</td>\n",
       "      <td>I got a surprise for you.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id author                                              quote\n",
       "0   1   Rick   Morty, you got to come on. You got to come wi...\n",
       "1   2  Morty                             Rick, what’s going on?\n",
       "2   3   Rick                   I got a surprise for you, Morty.\n",
       "3   4  Morty   It’s the middle of the night. What are you ta...\n",
       "4   5   Rick                          I got a surprise for you."
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.drop_table(\"rick_and_morty\", ignore_missing=True)\n",
    "table = db.create_table(\"rick_and_morty\", df)\n",
    "table.head().to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38d055be-ae3e-4190-b1cf-abf14cdf8975",
   "metadata": {},
   "source": [
    "## Updates"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "842550fb-da81-44ea-9e98-d5dbaa6916c7",
   "metadata": {},
   "source": [
    "Now, since Rick is the smartest man in the multiverse, he deserves to have his quotes attributed to his full name: Richard Daniel Sanchez.\n",
    "\n",
    "This can be done via `LanceTable.update`. It needs two arguments:\n",
    "\n",
    "1. A `where` string filter (sql syntax) to determine the rows to update\n",
    "2. A dict of `values` where the keys are the column names to update and the values are the new values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9eac4708-a8c4-49aa-bc13-8e60c5bf34a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>quote</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Rick, what’s going on?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>Morty</td>\n",
       "      <td>It’s the middle of the night. What are you ta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Ow! Ow! You’re tugging me too hard.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Yeah, Rick, it’s great. Is this the surprise?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>Morty</td>\n",
       "      <td>What?! A bomb?!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>94</th>\n",
       "      <td>80</td>\n",
       "      <td>Richard Daniel Sanchez</td>\n",
       "      <td>There you are, Morty. Listen to me. I got an ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>82</td>\n",
       "      <td>Richard Daniel Sanchez</td>\n",
       "      <td>It’s pretty obvious, Morty. I froze him. Now ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>84</td>\n",
       "      <td>Richard Daniel Sanchez</td>\n",
       "      <td>Do you have any concept of how much higher th...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>86</td>\n",
       "      <td>Richard Daniel Sanchez</td>\n",
       "      <td>I’ll do it later, Morty. He’ll be fine. Let’s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>97</td>\n",
       "      <td>Richard Daniel Sanchez</td>\n",
       "      <td>There she is. All right. Come on, Morty. Let’...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>99 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    id                  author  \\\n",
       "0    2                   Morty   \n",
       "1    4                   Morty   \n",
       "2    6                   Morty   \n",
       "3    9                   Morty   \n",
       "4   11                   Morty   \n",
       "..  ..                     ...   \n",
       "94  80  Richard Daniel Sanchez   \n",
       "95  82  Richard Daniel Sanchez   \n",
       "96  84  Richard Daniel Sanchez   \n",
       "97  86  Richard Daniel Sanchez   \n",
       "98  97  Richard Daniel Sanchez   \n",
       "\n",
       "                                                quote  \n",
       "0                              Rick, what’s going on?  \n",
       "1    It’s the middle of the night. What are you ta...  \n",
       "2                 Ow! Ow! You’re tugging me too hard.  \n",
       "3       Yeah, Rick, it’s great. Is this the surprise?  \n",
       "4                                     What?! A bomb?!  \n",
       "..                                                ...  \n",
       "94   There you are, Morty. Listen to me. I got an ...  \n",
       "95   It’s pretty obvious, Morty. I froze him. Now ...  \n",
       "96   Do you have any concept of how much higher th...  \n",
       "97   I’ll do it later, Morty. He’ll be fine. Let’s...  \n",
       "98   There she is. All right. Come on, Morty. Let’...  \n",
       "\n",
       "[99 rows x 3 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.update(where=\"author='Rick'\", values={\"author\": \"Richard Daniel Sanchez\"})\n",
    "table.to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac6499ce-af6d-4934-9051-be5f159ce623",
   "metadata": {},
   "source": [
    "## Schema evolution"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0402226b-6d0c-41c5-9257-069c4bf16825",
   "metadata": {},
   "source": [
    "Ok so this is a vector database, so we need actual vectors.\n",
    "We'll use sentence transformers here to avoid having to deal with API keys."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85db4ed9-8f80-4b56-9867-1381fa1c4c7d",
   "metadata": {},
   "source": [
    "Let's create a basic model using the \"all-MiniLM-L6-v2\" model and embed the quotes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "998f4eb5-31cd-49ae-9f7c-2ec4d6652ef6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer\n",
    "model = SentenceTransformer(\"all-MiniLM-L6-v2\", device=\"cpu\")\n",
    "vectors = model.encode(df.quote.values.tolist(),\n",
    "                       convert_to_numpy=True,\n",
    "                       normalize_embeddings=True).tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "539e2a0e-529b-439b-ba8c-a388907c4860",
   "metadata": {},
   "source": [
    "We can then convert the vectors into a pyarrow Table and merge it to the LanceDB Table.\n",
    "\n",
    "For the merge to work successfully, we need to have an overlapping column. Here the natural choice is to use the id column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ccbea593-85cf-484c-989f-9836a31c7906",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lance.vector import vec_to_table\n",
    "import numpy as np\n",
    "import pyarrow as pa"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "727c8230-7e41-436a-8666-60ee46e7041b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vector</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[-0.10369808, -0.038807657, -0.07471153, -0.05...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[-0.11813704, -0.0533092, 0.025554786, -0.0242...</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[-0.09807682, -0.035231438, -0.04206024, -0.06...</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[0.032292824, 0.038136397, 0.013615396, 0.0335...</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[-0.050369408, -0.0043397923, 0.013419108, -0....</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              vector  id\n",
       "0  [-0.10369808, -0.038807657, -0.07471153, -0.05...   1\n",
       "1  [-0.11813704, -0.0533092, 0.025554786, -0.0242...   2\n",
       "2  [-0.09807682, -0.035231438, -0.04206024, -0.06...   3\n",
       "3  [0.032292824, 0.038136397, 0.013615396, 0.0335...   4\n",
       "4  [-0.050369408, -0.0043397923, 0.013419108, -0....   5"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "embeddings = vec_to_table(vectors)\n",
    "embeddings = embeddings.append_column(\"id\", pa.array(np.arange(len(table))+1))\n",
    "embeddings.to_pandas().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "518da48d-6481-4c1e-8ba4-800d5e0542cf",
   "metadata": {},
   "source": [
    "And now we'll use the `LanceTable.merge` function to add the vector column into the LanceTable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a4326a70-9863-47e8-8f3f-565e35d558cf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>quote</th>\n",
       "      <th>vector</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Rick, what’s going on?</td>\n",
       "      <td>[-0.11813704, -0.0533092, 0.025554786, -0.0242...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>Morty</td>\n",
       "      <td>It’s the middle of the night. What are you ta...</td>\n",
       "      <td>[0.032292824, 0.038136397, 0.013615396, 0.0335...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Ow! Ow! You’re tugging me too hard.</td>\n",
       "      <td>[-0.035019904, -0.070963725, 0.003859435, -0.0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Yeah, Rick, it’s great. Is this the surprise?</td>\n",
       "      <td>[-0.12578955, -0.019364933, 0.01606114, -0.082...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>Morty</td>\n",
       "      <td>What?! A bomb?!</td>\n",
       "      <td>[0.0018287548, 0.07033146, -0.023754105, 0.047...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id author                                              quote  \\\n",
       "0   2  Morty                             Rick, what’s going on?   \n",
       "1   4  Morty   It’s the middle of the night. What are you ta...   \n",
       "2   6  Morty                Ow! Ow! You’re tugging me too hard.   \n",
       "3   9  Morty      Yeah, Rick, it’s great. Is this the surprise?   \n",
       "4  11  Morty                                    What?! A bomb?!   \n",
       "\n",
       "                                              vector  \n",
       "0  [-0.11813704, -0.0533092, 0.025554786, -0.0242...  \n",
       "1  [0.032292824, 0.038136397, 0.013615396, 0.0335...  \n",
       "2  [-0.035019904, -0.070963725, 0.003859435, -0.0...  \n",
       "3  [-0.12578955, -0.019364933, 0.01606114, -0.082...  \n",
       "4  [0.0018287548, 0.07033146, -0.023754105, 0.047...  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.merge(embeddings, left_on=\"id\")\n",
    "table.head().to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f590fec8-0ed0-4148-b940-c81abe7b421c",
   "metadata": {},
   "source": [
    "If we look at the schema, we see that `all-MiniLM-L6-v2` produces 384-dimensional vectors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ca9596a0-b4a0-4a5e-8d9e-967cd13b1eae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id: int64\n",
       "author: string\n",
       "quote: string\n",
       "vector: fixed_size_list<item: float>[384]\n",
       "  child 0, item: float"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f046002c-872c-4c39-ab85-e03c3b45b477",
   "metadata": {},
   "source": [
    "## Rollback\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbfc298c-ada2-411b-925f-e53dc9d35f3c",
   "metadata": {},
   "source": [
    "Suppose we used the table and found that the `all-MiniLM-L6-v2` model doesn't produce ideal results. Instead we want to try a larger model. How do we use the new embeddings without losing the change history?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfb116e4-b3b2-4b7e-bbf8-d3e63ca2aa14",
   "metadata": {},
   "source": [
    "First, major operations are automatically versioned in LanceDB.\n",
    "Version 1 is the table creation, with the initial insertion of data.\n",
    "Versions 2 and 3 represents the update (deletion + append)\n",
    "Version 4 is adding the new column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a411902b-43d0-4889-8e34-bc5f3c409726",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'version': 1,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 21, 613932),\n",
       "  'metadata': {}},\n",
       " {'version': 2,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 21, 626525),\n",
       "  'metadata': {}},\n",
       " {'version': 3,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 27, 91378),\n",
       "  'metadata': {}},\n",
       " {'version': 4,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 58, 4, 513085),\n",
       "  'metadata': {}}]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.list_versions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bd5e954-ac0f-4973-81c6-ad6120412d40",
   "metadata": {},
   "source": [
    "We can restore version 3, before we added the old vector column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ad0682cc-7599-459c-bbd8-1cd1f296c845",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>author</th>\n",
       "      <th>quote</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Rick, what’s going on?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>Morty</td>\n",
       "      <td>It’s the middle of the night. What are you ta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Ow! Ow! You’re tugging me too hard.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>Morty</td>\n",
       "      <td>Yeah, Rick, it’s great. Is this the surprise?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>Morty</td>\n",
       "      <td>What?! A bomb?!</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id author                                              quote\n",
       "0   2  Morty                             Rick, what’s going on?\n",
       "1   4  Morty   It’s the middle of the night. What are you ta...\n",
       "2   6  Morty                Ow! Ow! You’re tugging me too hard.\n",
       "3   9  Morty      Yeah, Rick, it’s great. Is this the surprise?\n",
       "4  11  Morty                                    What?! A bomb?!"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.restore(3)\n",
    "table.head().to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0a51146-40d0-4f16-9555-5ce68c2c9eee",
   "metadata": {},
   "source": [
    "Notice that we now have one more, not less versions. When we restore an old version, we're not deleting the version history, we're just creating a new version where the schema and data is equivalent to the restored old version. In this way, we can keep track of all of the changes and always rollback to a previous state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "d5bfb448-20b9-45e9-90ba-8a73abb86668",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'version': 1,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 21, 613932),\n",
       "  'metadata': {}},\n",
       " {'version': 2,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 21, 626525),\n",
       "  'metadata': {}},\n",
       " {'version': 3,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 57, 27, 91378),\n",
       "  'metadata': {}},\n",
       " {'version': 4,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 58, 4, 513085),\n",
       "  'metadata': {}},\n",
       " {'version': 5,\n",
       "  'timestamp': datetime.datetime(2024, 12, 17, 11, 58, 27, 153807),\n",
       "  'metadata': {}}]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.list_versions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6713cb53-8cb9-4235-9c55-337c311f0af6",
   "metadata": {},
   "source": [
    "### Switching Models\n",
    "\n",
    "Now we'll switch to the `all-mpnet-base-v2` model and add the vectors to the restored dataset again. Note that this step can take a couple of minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fa2950d-3002-4903-b6c3-2760ce60d079",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = SentenceTransformer(\"all-mpnet-base-v2\", device=\"cpu\")\n",
    "vectors = model.encode(df.quote.values.tolist(),\n",
    "                       convert_to_numpy=True,\n",
    "                       normalize_embeddings=True).tolist()\n",
    "embeddings = vec_to_table(vectors)\n",
    "embeddings = embeddings.append_column(\"id\", pa.array(np.arange(len(table))+1))\n",
    "table.merge(embeddings, left_on=\"id\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "694c46e0-a1c3-4869-a1eb-562f14606ad4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id: int64\n",
       "author: string\n",
       "quote: string\n",
       "vector: fixed_size_list<item: float>[768]\n",
       "  child 0, item: float"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e4085a5-a2e7-4520-acfc-eabaae2caa7d",
   "metadata": {},
   "source": [
    "## Deletion\n",
    "\n",
    "What if the whole show was just Rick-isms? \n",
    "Let's delete any quote not said by Rick:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9d11ddf1-b352-496c-91d7-99c70cbf304b",
   "metadata": {},
   "outputs": [],
   "source": [
    "table.delete(\"author != 'Richard Daniel Sanchez'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77d2f591-e492-423e-b995-2a18ae8cb831",
   "metadata": {},
   "source": [
    "We can see that the number of rows has been reduced to 30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "20bcce48-a5df-43c7-9ab9-7d59a83055e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "28"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(table)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef8457b2-1228-4a25-824e-477a07681b48",
   "metadata": {},
   "source": [
    "Ok we had our fun, let's get back to the full quote set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "6e279635-75b0-400c-8b43-4aa069282ccd",
   "metadata": {},
   "outputs": [],
   "source": [
    "table.restore(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "6a65b627-57a2-43b2-8acc-3805591845ad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "99"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(table)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae1a6ee8-8868-49de-82ab-17a0f61f3a47",
   "metadata": {},
   "source": [
    "## History\n",
    "\n",
    "We now have 9 versions in the data. We can review the operations that corresponds to each version below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "f595c9b8-91ec-48c1-9790-c40e1bd24b60",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.version"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "774f4eb0-03d4-4fda-a825-6217bf096619",
   "metadata": {},
   "source": [
    "\n",
    "Versions:\n",
    "- 1 - Create and append\n",
    "- 2 - Update (deletion)\n",
    "- 3 - Update (append)\n",
    "- 4 - Merge (vector column)\n",
    "- 5 - Restore (4)\n",
    "- 6 - Merge (new vector column)\n",
    "- 7 - Deletion\n",
    "- 8 - Restore"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb0131e6-2b73-442a-b4c6-6976a9cf4c7e",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97a1cf79-b46b-40cd-ada0-54edef358627",
   "metadata": {},
   "source": [
    "We never had to explicitly manage the versioning. And we never had to create expensive and slow snapshots. LanceDB automatically tracks the full history of operations and supports fast rollbacks. In production this is critical for debugging issues and minimizing downtime by rolling back to a previously successful state in seconds."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "doc-venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
